Automating the Generation of Semantic Annotation Schema Using a Clustering Technique
نویسندگان
چکیده
In order to generate semantic annotations for a collection of documents, one needs an annotation schema consisting of a semantic model (a.k.a. ontology) along with lists of linguistic indicators (keywords and patterns) for each concept in the ontology. The focus of this paper is the automatic generation of the linguistic indicators for a given semantic model and a corpus of documents. Our approach needs a small number of user-defined seeds and bootstraps itself by exploiting a novel clustering technique. The baseline for this work is the Cerno project [8] and the clustering algorithm LIMBO [2]. We also present results that compare the output of the clustering algorithm with linguistic indicators created manually for two case studies.
منابع مشابه
Semantic-Based Image Retrial in the VQ Compressed Domain using Image Annotation Statistical Models
متن کامل
Towards a Surface Realization-Oriented Corpus Annotation
Until recently, deep stochastic surface realization has been hindered by the lack of semantically annotated corpora. This is about to change. Such corpora are increasingly available, e.g., in the context of CoNLL shared tasks. However, recent experiments with CoNLL 2009 corpora show that these popular resources, which serve well for other applications, may not do so for generation. The attempts...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملLearning Semantic Parsers Using Statistical Syntactic Parsing Techniques
Most recent work on semantic analysis of natural language has focused on “shallow” semantics such as word-sense disambiguation and semantic role labeling. Our work addresses a more ambitious task we call semantic parsing where natural language sentences are mapped to complete formal meaning representations. We present our system SCISSOR based on a statistical parser that generates a semanticall...
متن کاملClustering Schema Elements for Semantic Integration of Heterogeneous Data Sources
Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clus...
متن کامل